Formant trajectories for acoustic-to-articulatory inversion
نویسندگان
چکیده
This work examines the utility of formant frequencies and their energies in acoustic-to-articulatory inversion. For this purpose, formant frequencies and formant spectral amplitudes are automatically estimated from audio, and are treated as observations for the purpose of estimating electromagnetic articulography (EMA) coil positions. A mixture Gaussian regression model with mel-frequency cepstral (MFCC) observations is modified by using formants and energies to either replace or augment the MFCC observation vector. The augmented observation results in 3.4% lower RMS error, and 2.7% higher correlation coefficient, than the baseline MFCC observation. Improvement is especially good for plosive consonants, possibly because formant tracking provides information about the acoustic resonances that would be otherwise unavailable during plosive closure and release.
منابع مشابه
Introduction of constraints in an acoustic-to-articulatory inversion method based on a hypercubic articulatory table
Our acoustic to articulatory inversion method exploits an original articulatory table structured in the form of a hypercube hierarchy. The articulatory space is decomposed into regions where the articulatory-to-acoustic mapping is linear. Each region is represented by a hypercube. The inversion procedure retrieves articulatory vectors corresponding to an acoustic entry from the hypercube table....
متن کاملUnsupervised vocal-tract length estimation through model-based acoustic-to-articulatory inversion
Knowledge of vocal-tract (VT) length is a logical prerequisite for acoustic-to-articulatory inversion. Prior work has treated VT length estimation (VTLE) and inversion largely as separate problems. We describe a new algorithm for VTLE based on acoustic-to-articulatory inversion. Our inversion process uses the Maeda model (MM, [1,2]) and combines global search [3] and dynamic programming for tra...
متن کاملModeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion.
Acoustic-to-articulatory inversion is a difficult problem mainly because of the nonlinearity between the articulatory and acoustic spaces and the nonuniqueness of this relationship. To resolve this problem, we have developed an inversion method that provides a complete description of the possible solutions without excessive constraints and retrieves realistic temporal dynamics of the vocal trac...
متن کاملVocal tract inversion by cepstral analysis-by-synthesis using chain matrices
Acoustic-to-articulatory inversion for vowels is performed by cepstral analysis-by-synthesis, using chain-matrix calculation of vocal tract (VT) acoustics and the Maeda articulatory model. The derivative of the VT chain matrix with respect to the area function was calculated in a novel efficient manner, and used in the BFGS quasi-Newton method for optimizing a distance measure between input and...
متن کاملSpeech recognition using non-linear trajectories in a formant-based articulatory layer of a multiple-level segmental HMM
This paper describes how non-linear formant trajectories, based on ‘trajectory HMM’ proposed by Tokuda et al., can be exploited under the framework of multiple-level segmental HMMs. In the resultant model, named a non-linear/linear multiple-level segmental HMM, speech dynamics are modeled as non-linear smooth trajectories in the formant-based intermediate layer. These formant trajectories are m...
متن کامل